Towards Parallel Czech-Russian Dependency Treebank
نویسنده
چکیده
In this paper we describe initial steps in constructing a Czech-Russian dependency treebank and discuss the perspectives of its development. Following the experience of the Czech-English Parallel Treebank we have taken a syntactically annotated “gold standard” text for one language (Russian) and run an automatic annotation on the respective parallel text for the other language (Czech). Our treebank includes also automatic word-alignment.
منابع مشابه
Prague Czech-English Dependency Treebank: Any Hopes For A Common Annotation Scheme?
The Prague Czech-English Dependency Treebank (PCEDT) is a new syntactically annotated Czech-English parallel resource. The Penn Treebank has been translated to Czech, and its annotation automatically transformed into dependency annotation scheme. The dependency annotation of Czech is done from plain text by automatic procedures. A small subset of corresponding Czech and English sentences has be...
متن کاملBridging Corpus for Russian in comparison with Czech
In this paper, we present a syntactic approach to the annotation of bridging relations, socalled genitive bridging. We introduce the RuGenBridge corpus for Russian annotated with genitive bridging and compare it to the semantic approach that was applied in the Prague Dependency Treebank for Czech. We discuss some special aspects of bridging resolution for Russian and specifics of bridging annot...
متن کاملCoreference in Prague Czech-English Dependency Treebank
We present coreference annotation on parallel Czech-English texts of the Prague Czech-English Dependency Treebank (PCEDT). The paper describes innovations made to PCEDT 2.0 concerning coreference, as well as the coreference information already present there. We characterize the coreference annotation scheme, give the statistics and compare our annotation with the coreference annotation in Onton...
متن کامل(Pre-)Annotation of Topic-Focus Articulation in Prague Czech-English Dependency Treebank
The objective of the present contribution is to give a survey of the annotation of information structure in the Czech part of the Prague Czech-English Dependency Treebank. We report on this first step in the process of building a parallel annotation of information structure in this corpus, and elaborate on the automatic pre-annotation procedure for the Czech part. The results of the pre-annotat...
متن کاملTreebanks in Machine Translation
We present an approach using treebanks in machine translation. Our experiment in Czech-English machine translation is an attempt to develop a full machine translation system based on dependency trees (Dependency Based Machine Translation, DBMT). We use the following resources: Prague Dependency Treebank, a newly created Czech-English parallel corpus of Penn Treebank, English monolingual corpus,...
متن کامل